Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making code efficient*


    Hi I have written a code to convert the files from excel to state and merge variables from two different files and lastly append the data sets. is there a way to make the code efficient please it is taking a lot of time



    global temp_directory "C:\Users\ramzan\Downloads\Compressed\air_temp_201 4"


    global precip_directory "C:\Users\ramzan\Downloads\Compressed\precip_2 017"


    global years 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014


    ******* Import the temp files into stata format
    cd "$temp_directory"
    foreach dataset in $years{
    clear
    import excel using air_temp_`dataset'.xlsx, sheet("Sheet1") firstrow
    destring Year, replace
    *export excel using air_temp_`var'.xlsx , firstrow(variables) replace
    rename (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) (temp#), addnumber
    reshape long temp, i(lat longitude Year) j(month)
    label variable lat "Latitude"
    generate date = ym(Year,month), after(month)
    format %tm date
    egen id_g = group (lat longitude)

    *append using air_temp_
    cd "$temp_directory"
    save air_temp_`dataset'.dta , replace


    }




    ******* Import the precip files into stata format
    cd $precip_directory
    foreach dataset in $years {
    clear

    import excel using precip_`dataset'.xlsx, sheet("Sheet1") firstrow
    destring Year, replace
    *export excel using air_temp_`var'.xlsx , firstrow(variables) replace
    rename (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec) (precip#), addnumber
    reshape long precip, i(lat longitude Year) j(month)
    label variable lat "Latitude"
    generate date = ym(Year,month), after(month)
    egen id_g = group (lat longitude)
    format %tm date
    *append using air_temp_
    save precip_`dataset'.dta , replace


    }


    ******* Merge temp and precip files

    foreach dataset in $years {
    cd $temp_directory
    use air_temp_`dataset'.dta, clear
    *rename Year year
    cd $precip_directory
    merge 1:1 id_g month Year using precip_`dataset'
    drop _merge id_g
    cd $temp_directory
    save air_temp_`dataset'.dta, replace

    }

    ******* Append temp and precip files
    clear
    use "air_temp_1981.dta", clear /* select first file*/
    foreach dataset in 1982{
    append using air_temp_`dataset' /*Combine datasets together*/
    sort Year month ID
    save air_temp_all.dta , replace
    erase air_temp_`dataset'.dta /*Erase files we're done with*/

    }
    gen temp_c = (temp-32)*(5/9)


    use air_temp_all.dta,clear

  • #2
    reshape can be slow in large datasets. You could look at fastreshape or sreshape on SSC (ssc desc fastreshape and ssc desc sreshape), to see if that noticeably improves speed in your case.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Also available on SSC, and faster than official Stata's -reshape long- are -tolong- and -freshape-. And there is also -greshape- available as part of the -gtools- package at github.com/mcaceresb/stata-gtools.

      That said, you are reading and writing a large number of files. That is going to be slow if the files are of any appreciable size. It will be slower still if they are located on a drive that has to be accessed over a network (although I see they are on your C: drive, so presumably that is local).

      I also notice that your final loop starts with -foreach dataset in 1982{-, and that after -use-ing a 1981 file. So after generating intermediate results for 35 years, in the end you use only years 1981 and 1982. You could save a great deal of time by not bothering with the other years in the other loops. In fact, just changing the definition of global macro years to -global years 1981 1982- will reduce the entire computation by a factor of 17.5.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Also available on SSC, and faster than official Stata's -reshape long- are -tolong- and -freshape-. And there is also -greshape- available as part of the -gtools- package at github.com/mcaceresb/stata-gtools.

        That said, you are reading and writing a large number of files. That is going to be slow if the files are of any appreciable size. It will be slower still if they are located on a drive that has to be accessed over a network (although I see they are on your C: drive, so presumably that is local).

        I also notice that your final loop starts with -foreach dataset in 1982{-, and that after -use-ing a 1981 file. So after generating intermediate results for 35 years, in the end you use only years 1981 and 1982. You could save a great deal of time by not bothering with the other years in the other loops. In fact, just changing the definition of global macro years to -global years 1981 1982- will reduce the entire computation by a factor of 17.5.
        Thanks, Clyde Schechter


        ******* Append temp and precip files
        clear
        use "air_temp_1981.dta", clear /* select first file*/
        foreach dataset in 1982{
        append using air_temp_`dataset' /*Combine datasets together*/
        sort Year month ID
        save air_temp_all.dta , replace
        erase air_temp_`dataset'.dta /*Erase files we're done with*/

        }


        actually, this was only for testing and practice


        I have to append it for all of the years defined in the global variable.

        I have a question is there a way to define THE first value in $years and do the operation on all the values in $years expect the first value?


        Thanks





        Comment


        • #5
          Muhammad Ramzan you seem to need all years from 1980-2014. It might be easier to directly specify a numlist in the loop instead of the global. Also, if your question in #4 is because of the append command, you can instead just initialise an empty file and then append everything. So your code would look something like this:

          Code:
          clear
          save air_temp_all, replace emptyok
          
          foreach dataset of numlist 1980/2014 {
              dis "Appending dataset for the year `dataset'..."
              append using air_temp_`dataset' /*Combine datasets together*/
              sort Year month ID
              save air_temp_all , replace
              erase air_temp_`dataset'.dta /*Erase files we're done with*/
          }
          Last edited by Hemanshu Kumar; 12 Nov 2022, 01:46.

          Comment


          • #6
            Originally posted by Hemanshu Kumar View Post
            Muhammad Ramzan you seem to need all years from 1980-2014. It might be easier to directly specify a numlist in the loop instead of the global. Also, if your question in #4 is because of the append command, you can instead just initialise an empty file and then append everything. So your code would look something like this:

            Code:
            clear
            save air_temp_all, replace emptyok
            
            foreach dataset of numlist 1980/2014 {
            dis "Appending dataset for the year `dataset'..."
            append using air_temp_`dataset' /*Combine datasets together*/
            sort Year month ID
            save air_temp_all , replace
            erase air_temp_`dataset'.dta /*Erase files we're done with*/
            }
            Thanks @Hemanshu Kumar

            I have a question how I can change the file name from
            save air_temp_all , replace to
            save air_temp_1980_2004.dta , replace air_temp_firstyear from numlist _ last value form numlist

            Thanks

            Comment


            • #7
              and will doing qui use and qui save will speed up the calculations or not


              thanks

              Comment


              • #8
                Perhaps
                Code:
                local fyr 1980
                local lyr 2014
                clear
                save air_temp_`fyr'_`lyr', replace emptyok
                
                foreach dataset of numlist `fyr'/`lyr' {
                    dis "Appending dataset for the year `dataset'..."
                    append using air_temp_`dataset' /*Combine datasets together*/
                    sort Year month ID
                    save air_temp_`fyr'_`lyr , replace
                    erase air_temp_`dataset'.dta /*Erase files we're done with*/
                }
                and will doing qui use and qui save will speed up the calculations or not
                No, the quietly prefix in general will not speed up the processing significantly, but be sure you have
                Code:
                set more off
                so Stata doesn't pause each time the Results window is full.

                Comment


                • #9
                  Originally posted by William Lisowski View Post
                  Perhaps
                  Code:
                  local fyr 1980
                  local lyr 2014
                  clear
                  save air_temp_`fyr'_`lyr', replace emptyok
                  
                  foreach dataset of numlist `fyr'/`lyr' {
                  dis "Appending dataset for the year `dataset'..."
                  append using air_temp_`dataset' /*Combine datasets together*/
                  sort Year month ID
                  save air_temp_`fyr'_`lyr , replace
                  erase air_temp_`dataset'.dta /*Erase files we're done with*/
                  }
                  No, the quietly prefix in general will not speed up the processing significantly, but be sure you have
                  Code:
                  set more off
                  so Stata doesn't pause each time the Results window is full.


                  . foreach dataset of numlist `fyr'/`lyr' {
                  2. dis "Appending dataset for the year `dataset'..."
                  3. append using air_temp_`dataset' /*Combine datasets together*/
                  4. sort Year month ID
                  5. save air_temp_`fyr'_`lyr , replace
                  6. erase air_temp_`dataset'.dta /*Erase files we're done with*/
                  7. }
                  Appending dataset for the year 2013...
                  invalid '`'



                  Thanks I am getting the following error message

                  please

                  Comment


                  • #10
                    Please compare the two save commands and add the missing character to the second one.

                    Comment


                    • #11
                      Originally posted by William Lisowski View Post
                      Please compare the two save commands and add the missing character to the second one.
                      Thanks

                      Comment

                      Working...
                      X